Exploiting domain information for Word Sense Disambiguation of medical documents
نویسندگان
چکیده
OBJECTIVE Current techniques for knowledge-based Word Sense Disambiguation (WSD) of ambiguous biomedical terms rely on relations in the Unified Medical Language System Metathesaurus but do not take into account the domain of the target documents. The authors' goal is to improve these methods by using information about the topic of the document in which the ambiguous term appears. DESIGN The authors proposed and implemented several methods to extract lists of key terms associated with Medical Subject Heading terms. These key terms are used to represent the document topic in a knowledge-based WSD system. They are applied both alone and in combination with local context. MEASUREMENTS A standard measure of accuracy was calculated over the set of target words in the widely used National Library of Medicine WSD dataset. RESULTS AND DISCUSSION The authors report a significant improvement when combining those key terms with local context, showing that domain information improves the results of a WSD system based on the Unified Medical Language System Metathesaurus alone. The best results were obtained using key terms obtained by relevance feedback and weighted by inverse document frequency.
منابع مشابه
DALE: A Word Sense Disambiguation System for Biomedical Documents Trained using Automatically Labeled Examples
Automatic interpretation of documents is hampered by the fact that language contains terms which have multiple meanings. These ambiguities can still be found when language is restricted to a particular domain, such as biomedicine. Word Sense Disambiguation (WSD) systems attempt to resolve these ambiguities but are often only able to identify the meanings for a small set of ambiguous terms. DALE...
متن کاملبررسی نقش انواع بافتار همنویسهها در تعیین شباهت بین مدارک
Aim: Automatic information retrieval is based on the assumption that texts contain content or structural elements that can be used in word sense disambiguation and thereby improving the effectiveness of the results retrieved. Homographs are among the words requiring sense disambiguation. Depending on their roles and positions in texts, homograph contexts could be divided to different types, wit...
متن کاملUn Sistema de Extracción de Información Basado en Ontologías para Documentos en el Dominio de las Tecnologías de Información An Ontology-Based Information Extractor for Data-Rich Documents in the Information Technology Domain
This paper presents an information extraction method, suitable for data-rich documents, based on the knowledge represented in a domain ontology. The extractor combines a fuzzy string matcher and a word sense disambiguation (WSD) algorithm. The fuzzy string matcher finds mentions of terms combining character-level and token-level similarity measures dealing with non-standardized acronyms and inc...
متن کاملSTRIDER: A Versatile System for Structural Disambiguation
We present STRIDER, a versatile system for the disambiguation of structure-based information like XML schemas, structures of XML documents and web directories. The system performs high-quality fully-automated disambiguation by exploiting a novel and versatile structural disambiguation approach.
متن کاملComparing Corpora And Lexical Ambiguity
In this paper we compare two types of corpus, focusing on the lexical mnbiguity of each of them. The first corpns consists mainly of newspaper articles and Hterature excerpts, while the second belc)ngs to the medical domain. To conduct the study, we have used two different disambiguation tools. However, first of all, we must verify the performance of each system in its respective application do...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 19 شماره
صفحات -
تاریخ انتشار 2012